Upload X-ray Dataset

This guide provides instructions for uploading X-ray crystallography data to CryoCloud for automated structure refinement and analysis.

Overview

X-ray datasets in CryoCloud typically consist of diffraction experiments from a single synchrotron trip, provided either as raw images or pre-processed datasets. Raw data is identified by *_master.h5 files or a sufficient number of cbf files with a common file prefix. For pre-processing software, we support metadata extraction from autoPROC, xia2-* (variations of DIALS and 3D), and Grenades. Please feel free to reach out to us if you want to upload other pre-processed mtz files.

Our platform provides access to the automated pipedream pipeline by Global Phasing, which can refine structures and generate high-resolution models with minimal manual intervention.

Preparing Your Data

Before uploading, ensure your X-ray dataset includes:

Diffraction data - Raw or pre-processed diffraction data files from your beamline
- Pre-processing metadata files (optional) - Any associated metadata files
Synchrotron freezing sheet - A spreadsheet file with the description of the collected data in either CSV or XLSX format. Please find the minimum required CSV file template here
Reference structures - Reference coordinates in PDB format

Dataset Structure and Upload

General guidelines for dataset upload can be found on the Dataset page. Here we focus on specifics associated with X-ray data.

Data Organization

The dataset scanning method follows these steps:

Parse the metadata spreadsheet containing Crystal Position, Target Protein, and optional Ligand Information (ID and SMILES string). Alternatively, you can supply this information via manual input
Identify data subfolders in cloud storage (raw images and pre-processed data) based on common filenames
Match Crystal Positions to data using path matching logic: <puckID>_<crystal_position> is searched in the identified file paths (or directory paths in the case of raw data)
Extract metadata from pre-processing logs and unit cell parameters from reference PDB files:
- We look for any PDB files in the bucket starting with the Protein Acronym/Target found in the CSV/XLSX
Create dataset entries by pairing ligands with their experimental data

After ensuring your uploaded data follows these guidelines, click Save to proceed. After a few moments, you should see your dataset populated.

X-ray dataset view showing populated data subsets

Next Steps

Once your X-ray dataset is uploaded and configured, you can create a Workflow in the Workflows tab. Currently, it is only necessary to add a single Node of XRayRefine Job to a workflow, adjust the desired parameters, and then navigate to the Projects tab to start a Project from the Workflow you just created.

Troubleshooting

Upload Issues:

If an expected subset is missing:

Ensure the spreadsheet specifies both the Crystal Position and the Protein Target.
Upload at least one reference PDB model for each Protein Target.
Confirm the software used is supported (metadata is extracted from log files).
Check for duplicate crystal position records in your spreadsheet.

Need Help?

If you encounter issues with X-ray dataset upload or have questions about data formats:

Contact CryoCloud support
Refer to our X-ray Tutorial for a complete workflow example
Check our Support page for additional resources

Upload X-ray Dataset

Overview​

Preparing Your Data​

Dataset Structure and Upload​

Data Organization​

Next Steps​

Troubleshooting​

Need Help?​